Goto

Collaborating Authors

 online algorithm


Combinatorial Ski Rental Problem: Robust and Learning-Augmented Algorithms

Neural Information Processing Systems

We introduce and study the Combinatorial Ski Rental (CSR) problem, which involves multiple items that can be rented or purchased, either individually or in combination. At each time step, a decision-maker must make an irrevocable buy-orrent decision for items that have not yet been purchased, without knowing the end of the time horizon. We propose a randomized online algorithm, Sorted Optimal Amortized Cost (SOAC), that achieves the optimal competitive ratio. Moreover, SOACcan be extended to address various well-known ski rental variants, including the multi-slope, multi-shop, multi-commodity ski rental and CSRwith upgrading problems. Building on the proposed SOACalgorithm, we further develop a learningaugmented algorithm that leverages machine-learned predictions to improve the performance of CSR. This algorithm is capable of recovering or improving upon existing results of learning-augmented algorithms in both the classic ski rental and multi-shop ski rental problems. Experimental results validate our theoretical analysis and demonstrate the advantages of our algorithms over baseline methods for ski rental problems.


Fairness-Regularized Online Optimization with Switching Costs

Neural Information Processing Systems

Fairness and action smoothness are two crucial considerations in many online optimization problems, but they have yet to be addressed simultaneously. In this paper, we study a new and challenging setting of fairness-regularized smoothed online convex optimization with switching costs. First, to highlight the fundamental challenges introduced by the long-term fairness regularizer evaluated based on the entire sequence of actions, we prove that even without switching costs, no online algorithms can possibly achieve a sublinear regret or finite competitive ratio compared to the offline optimal algorithm as the problem episode length T increases. Then, we propose FairOBD(Fairness-regularized Online Balanced Descent), which reconciles the tension between minimizing the hitting cost, switching cost, and fairness cost.


Online Two-Stage Submodular Maximization

Neural Information Processing Systems

Given a collection of monotone submodular functions, the goal of Two-Stage Submodular Maximization (2SSM) [Balkanski et al., 2016] is to restrict the ground set so an objective selected u.a.r.



Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems

Neural Information Processing Systems

This paper leverages machine-learned predictions to design competitive algorithms for online conversion problems with the goal of improving the competitive ratio when predictions are accurate (i.e., consistency), while also guaranteeing a worstcase competitive ratio regardless of the prediction quality (i.e., robustness). We unify the algorithmic design of both integral and fractional conversion problems, which are also known as the 1-max-search and one-way trading problems, into a class of online threshold-based algorithms (OTA). By incorporating predictions into design of OTA, we achieve the Pareto-optimal trade-off of consistency and robustness, i.e., no online algorithm can achieve a better consistency guarantee given for a robustness guarantee. We demonstrate the performance of OTA using numerical experiments on Bitcoin conversion.


3ce3bd7d63a2c9c81983cc8e9bd02ae5-Supplemental.pdf

Neural Information Processing Systems

We start by restating the setup in which our algorithm operates. The type of ICA considered in our work assumes the following generative model. There are dsources recorded T times forming the columns of S:= [s1,...,sT] Rd T whose components s1t,...,sdt are assumed non-Gaussian and independent. Without loss of generality, we assume that each source has zero-mean, unit variance, and finite and distinct kurtosis, a common assumption among kurtosis-based ICA methods [12]. The kurtosis of a random variable v is defined as kurt[v] = E (v E(v))4 / E (v E(v))2 2. Finally, sources are assumed to be mixed through a linear system, i.e., there exists a full rank mixing matrix, A Rd d, producing the d-dimensional mixture, xt, expressed as xt = Ast t {1,...,T} .



25c5133ad2ab138f448b71b3c7345ec3-Paper-Conference.pdf

Neural Information Processing Systems

We propose a model for online graph problems where algorithms are given access to an oracle that predicts (e.g., based on modeling assumptions or on past data) the degrees of nodes in the graph. Within this model, we study the classic problem of online bipartite matching, and a natural greedy matching algorithm called MinPredictedDegree, which uses predictions of the degrees of offline nodes. For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i.e., it is optimal for graphs drawn from this model. Since the "symmetric" version of the model, where all online nodes are identical, is a special case of the well-studied "known i.i.d.


Online_Knapsack_with_Predictions (6)

Neural Information Processing Systems

There has been recent interest in using machine-learned predictions to improve the worst-case guarantees of online algorithms. In this paper we continue this line of work by studying the online knapsack problem, but with very weak predictions: in the form of knowing an upper and lower bound for the number of items of each value. We systematically derive online algorithms that attain the best possible competitive ratio for any fixed prediction; we also extend the results to more general settings such as generalized one-way trading and two-stage online knapsack. Our work shows that even seemingly weak predictions can be utilized effectively to provably improve the performance of online algorithms.


Online Reinforcement Learning for Mixed Policy Scopes

Neural Information Processing Systems

Combination therapy refers to the use of multiple treatments - such as surgery, medication, and behavioral therapy - to cure a single disease, and has become a cornerstone for treating various conditions including cancer, HIV, and depression. All possible combinations of treatments lead to a collection of treatment regimens (i.e., policies) with mixed scopes, or what physicians could observe and which actions they should take depending on the context. In this paper, we investigate the online reinforcement learning setting for optimizing the policy space with mixed scopes. In particular, we develop novel online algorithms that achieve sublinear regret compared to an optimal agent deployed in the environment. The regret bound has a dependency on the maximal cardinality of the induced state-action space associated with mixed scopes. We further introduce a canonical representation for an arbitrary subset of interventional distributions given a causal diagram, which leads to a non-trivial, minimal representation of the model parameters.